By Diana L. Magnuson; Curator and Historian, ISRDI
IPUMS now has an online IPUMS Document Collection for our ancillary census and survey materials collected by IPUMS International!
In 1999, with a social science infrastructure grant from the National Science Foundation (NSF), IPUMS International had a simple yet audaciously ambitious goal: preserve the world’s microdata resources and democratize access to those sources. Twenty-five years later, the project goals continue to be: collecting and preserving census and survey data and documentation; harmonizing those data; and disseminating the harmonized data free of charge.
IPUMS-I amassed tens of thousands of ancillary materials in support of its data harmonization work. These materials came from partner organizations: United State Census Bureau (USCB), United Nations Statistical Division (UNSD), Latin American and Caribbean Demographic Center (CELADE), The East-West Center, Centre Population et Dévelopement (CEPED), and over one hundred national statistical agencies. Examples of this material include correspondence, maps, enumerator instructions, supervisor instructions, training materials, codebooks, publicity, reports, newspaper clippings, unpublished papers, census timetables, data processing materials, and technical manuals. The ancillary materials in the IPUMS collection attest to the varied technical, business, social, and economic aspects of conducting census and surveys across time and space.
A portion of IPUMS-I grant money has funded the curation and preservation of the ancillary materials acquired by the project. For over two decades, archival staff have been preserving thousands of unique pieces of census and survey documentation, creating bibliographic records using an extended Dublin Core profile that supports the use of controlled vocabularies to enhance findability for the project staff and outside users. The goal of this work was the creation of a simple, findable, searchable, and downloadable document access system.
An early iteration of the effort to make available basic international census documentation was the creation of a World Population Census Forms webpage. The aim of this page was primarily to demonstrate to potential international data partners the scope and capacity of IPUMS-I. It provided a tabular list of census forms organized by county and year, with minimal functionality offered by hyperlinks to download the documents. While useful for organizing, identifying, and accessing standard international census forms, this static utilitarian webpage challenged us to consider how we could provide access to the thousands of additional materials entrusted to us. We needed a simple but more sophisticated tool for stewarding our archival resources.
Beginning in 1999, ancillary documents began streaming into the archive as a product of IPUMS-I acquisition efforts. It is likely that IPUMS-I uniquely holds some of these materials. A portion of the materials arrived with the express understanding that they would become available through some means after their lifecycle in the IPUMS-I microdata harmonization work had concluded. Other acquired materials were broader than the IPUMS-I data collection and covered countries, census, and surveys for which IPUMS-I does not currently disseminate microdata.
Data curator Wendy Thomas recognized the significance of these materials and anticipated the day when an archival document access system would be a reality. Thomas’ past experience providing research support for social science data users and her technical expertise curating data and metadata positioned her well for envisioning an online IPUMS document access system. Thomas immediately took steps to assemble the scaffolding necessary to build a flexible metadata warehouse. From the beginning, Thomas advocated for a simple online tool that would provide basic metadata and discoverable, accessible, searchable, and downloadable PDFs of digitized archival materials.
Building a metadata warehouse for the IPUMS Document Collection was years in the making. First, Thomas built up the scaffolding for logical intake and workflow for digital and manuscript materials. Next, Thomas developed a process using a text editor for creating structured bibliographic records for each piece of archival material. Using an extended Dublin Core profile, Thomas hand-tailored a controlled vocabulary to the archival dimensions of IPUMS-I ancillary materials. To date, over forty undergraduate students have worked to digitize, clean, and create metadata for the IPUMS Document Collection over the years.
The steady growth of the IPUMS-I data project, increasing expectations of external funding organizations regarding IPUMS preservation practices, a transition in IPUMS archive leadership due to Thomas’ retirement, and an opening in the IPUMS-IT Product Team quarterly calendar, eventually led to concrete steps toward building an online document access system. The IPUMS-IT Product Team had periodically explored existing document access product solutions and each time reached the conclusion that too much modification was necessary to leverage the use of our controlled vocabularies. IPUMS would build its own user web interface.
The launch of the beta version of the IPUMS Document Collection occurred in early June 2023, using metadata from a single region, Oceania. PDF documents and metadata from this region served as a test dataset to represent the quality and content of the metadata records and to evaluate the ability of these records to support the functionality of the web interface. The Oceania dataset is small (9 countries, 626 document records, 29 collection records, and 653 PDF files), but is representative of the overall IPUMS-I collection and served as a diagnostic in the development of the web interface.
In late October 2024, a new region was added to the IPUMS Document Collection: Africa. The IPUMS Document Collection is now live on ipums.org with over 7,900 searchable and downloadable documents. Stay tuned! The region of Asia will be added in 2025, with Europe and the Americas in due course.
Some content in this blog post was published in
Diana L. Magnuson (March 2024) “Stewarding Our Resources: Building a Sustainable IPUMS Archival Document Access System,” IASSIST Quarterly, 48(1).